Genetic Network Programming Based Rule Accumulation for Agent Control
نویسنده
چکیده
Multi-agent control is a hot but challenging research topic which covers many research fields, such as evolutionary computation, machine learning, neural networks, etc.. Various approaches have been proposed to guide agents’ actions in different environments. Evolutionary Computation (EC) is often chosen to control agents’ behaviors since it can generate the best control policy through evolution. As a powerful machine learning approach, Reinforcement Learning (RL) is competent for agent control since it enables agents to directly interact with environments and get rewards through “trial and error”s. It is fast and efficient in dealing with some simple problems. However, its state-action pairs may become exponentially large in complex environments, which is computationally intractable. Neural Networks (NNs) could also be used to guide agents’ actions since it can map between the input of the environment information and the output of control signals. However, in some high dynamic environments which are uncertain and changing all the time, NN could not work. Genetic Network Programming is a newly developed EC method which chooses the directed graph structure as its chromosome. High expression ability of the graph structure, reusability of nodes and realization of partially observable processes enable GNP to deal with many complex problems in dynamic environments. One of the disadvantages of GNP is that its gene structure may become too complicated after generations of the training. In the testing phase, it might not be able to adapt to the new environment easily and its generalization ability is not good. This is because the implicit memory function of GNP can not memorize enough information of the environment, which is incompetent in dealing with the agent control problems in high dynamic environments. Therefore, explicit memory should be associated with GNP in order to explore its full potential. Various research has revealed that memory schemes could enhance EC in dynamic environments. This is because storing the useful historical information into the memory could improve the search ability of EC. Inspired by this idea, a GNP-based explicit memory scheme named “Genetic Network Programming with Rule Accumulation” is proposed in this thesis. Focusing on this issue, it is studied in the following chapters of this thesis how to create action rules and use them for agent control, how to improve the performance in Non-Markov environments, how to prune the bad rules to improve the quality of the rule pool, how to build a rule pool adapting to the environment changes and how to create more general rules for agent control in dynamic environments. The organization of this thesis is as follows. Chapter 1 describes the research background, problems to be solved and outline of the thesis. Some classical methods in the domain of evolutionary computation and reinforcement learning are reviewed. Chapter 2 designs the general framework of GNP-RA, which contains two stages, the training stage and the testing stage. In the training stage, the node transitions of GNP are recorded as rules and stored into the rule pool generation by generation. In the testing stage, all the rules in the rule pool are used to determine agents’ actions through a unique matching degree calculation. The very different point of GNP-RA from the basic GNP is that GNP-RA uses a great number of rules to determine agents’ actions. However, GNP could use only one rule corresponding to its node transition. Therefore, the generalization ability of GNP-RA is better than that of GNP. Moreover, GNP-RA could make use of the previous experiences in the form of rules to determine agents’ current action, which means that GNP-RA could learn from agents’ past behaviors. This also helps the current agent to take correct actions and improve its performance. Simulations on the tile-world demonstrate that GNP-RA could achieve higher fitness values and better generalization ability. Chapter 3 aims to solve the perceptual aliasing problem and improve the performance for multi-agent control in non-Markov environments. The perceptual aliasing problem refers to that different situations seem identical to agents, but different optimal actions are required. In order to solve this problem, a new rule-based model, “GNP with multi-order rule accumulation (GNP-MRA)” is proposed in this chapter. Each multi-order rule records not only the current environment information and agent’s actions, but also the previous environment information and agent’s actions, which helps agents to distinguish the aliasing situations and take proper actions. Simulation results prove the effectiveness of GNP-MRA, and reveal that the higher the rule order is, the more information it can record, and the more easily agents can distinguish different aliasing situations. Therefore, multi-order rules are more efficient for agent control in non-Markov environments. Chapter 4 focuses on how to improve the quality of the rule pool. Two improvements are made in order to realize this. One of them is that during the rule generation, reinforcement learning is combined with evolution in order to create more efficient rules. The obtained knowledge during the running of the program could be used to select the proper processing for judgments, i.e., better rules. The second approach is that after the rule generation, a unique rule pruning method using bad individuals is proposed. The basic idea is that good individuals are used as rule generators and bad individuals are used as monitors to filter the generated bad rules. Four pruning methods are proposed and their performances are discussed and compared. After pruning the bad rules, the good rules could stand out and contribute to better performances. Simulation results demonstrate the efficiency and effectiveness of the enhanced rule-based model. Chapter 5 is devoted to improve the adaptability of GNP-RA depending on the environment changes. GNP-RA has poor adaptability to the dynamic environments since it always uses the old rules in the rule pool for agent control. Generally speaking, the environment keeps changing all the time, while the rules in the rule pool remain the same. Therefore, the old rules in the rule pool become incompetent to guide agents’ actions in the new environments. In order to solve this problem, Sarsa-learning is used as a tool to update the old rules to cope with the inexperienced situations in the new environments. The basic idea is that when evolution ends, the elite individual of GNP-RA still execute Sarsa-learning to update the Q table. With the changes of the Q table, the node transitions could be changed in accordance with the environment, bringing some new rules. These rules are used to update the rule pool, so that the rule pool could adapt to the changing environments. Chapter 6 tries to improve the generalization ability of GNP-RA by pruning the harmful nodes. In order to realize this, “Credit GNP”is proposed in this chapter. Firstly, Credit GNP has a unique structure, where each node has an additional “credit branch”which can be used to skip the harmful nodes. This gene structure has more exploration ability than the conventional GNP-RA. Secondly, Credit GNP combines evolution and reinforcement learning, i.e., off-line evolution and on-line learning to prune the harmful nodes. Which node to prune and how many nodes to prune are determined automatically considering different environments. Thirdly, Credit GNP could select the really useful nodes and prune the harmful ones dynamically and flexibly considering different situations. Therefore, Credit GNP could determine the optimal size of the program along with the changing environments. Simulation results demonstrated that Credit GNP could generate not only more compact programs, but also more general rules. The generalization ability of GNP-RA was improved by Credit GNP. Chapter 7 makes conclusions of this thesis by describing the achievements of the proposed methods based on the simulation results.
منابع مشابه
Bedload transport predictions based on field measurement data by combination of artificial neural network and genetic programming
Bedload transport is an essential component of river dynamics and estimation of its rate is important to many aspects of river management. In this study, measured bedload by Helley- Smith sampler was used to estimate the bedload transport of Kurau River in Malaysia. An artificial neural network, genetic programming and a combination of genetic programming and a neural network were used to estim...
متن کاملAdaptive Rule-Base Influence Function Mechanism for Cultural Algorithm
This study proposes a modified version of cultural algorithms (CAs) which benefits from rule-based system for influence function. This rule-based system selects and applies the suitable knowledge source according to the distribution of the solutions. This is important to use appropriate influence function to apply to a specific individual, regarding to its role in the search process. This rule ...
متن کاملBedload transport predictions based on field measurement data by combination of artificial neural network and genetic programming
Bedload transport is an essential component of river dynamics and estimation of its rate is important to many aspects of river management. In this study, measured bedload by Helley- Smith sampler was used to estimate the bedload transport of Kurau River in Malaysia. An artificial neural network, genetic programming and a combination of genetic programming and a neural network were used to estim...
متن کاملAn Application of Genetic Network Programming Model for Pricing of Basket Default Swaps (BDS)
The credit derivatives market has experienced remarkable growth over the past decade. As such, there is a growing interest in tools for pricing of the most prominent credit derivative, the credit default swap (CDS). In this paper, we propose a heuristic algorithm for pricing of basket default swaps (BDS). For this purpose, genetic network programming (GNP), which is one of the recent evolutiona...
متن کاملOptimization of Dez dam reservoir operation using genetic algorithm
Water reservoir programming studies aim to determine the final cultivated land area based on predefined agricultural models and water requirements. Dam utilization rule curve is also provided in such studies. The system of Dez dam water resources was simulated applying the basic information in order to determine the capability of its reservoir to provide the objectives of the performed plan. As...
متن کاملA Mixed Integer Programming Approach to Optimal Feeder Routing for Tree-Based Distribution System: A Case Study
A genetic algorithm is proposed to optimize a tree-structured power distribution network considering optimal cable sizing. For minimizing the total cost of the network, a mixed-integer programming model is presented determining the optimal sizes of cables with minimized location-allocation cost. For designing the distribution lines in a power network, the primary factors must be considered as m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012